Automatic Speech Recognition: An Improved Paradigm
نویسندگان
چکیده
In this paper we present a short survey of automatic speech recognition systems underlining the current achievements and capabilities of current day solutions as well as their inherent limitations and shortcomings. In response to which we propose an improved paradigm and algorithm for building an automatic speech recognition system that actively adapts its recognition model in an unsupervised fashion by listening to continuous human speech. The paradigm relies on creating a semi-autonomous system that samples continuous human speech in order to record phonetic units. Then processes those phoneme sized samples to identify the degree of similarity of each sample that will allow the detection of the same phoneme across many samples. After a sufficiently large database of samples has been gathered the system clusters the samples based on their degree of similarity, creating a different cluster for each phoneme. After that the system trains one neural network for each cluster using the samples in that cluster. After a few iterations of sampling, processing, clustering and training the system should contain a neural network detector for each phoneme unit of the spoken language that the system has been exposed to, and be able to use these detectors to recognize phonemes from live speech. Finally we provide the structure and algorithms for this novel automatic speech recognition paradigm.
منابع مشابه
A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کاملSpeech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions
Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...
متن کاملUse of Procedural Knowledge for Automatic Speech Recognition
A paradigm for automatic speech recognition using networks of actions performing variable depth analysis is presented. The paradigm produces descriptions of speech properties that are related to speech units through Markov models representing system performance. Preliminary results in the recognition of isolated letters and digits are presented. 1. I N T R O D U C T I O N Recent results on Auto...
متن کاملDesigning and implementing a system for Automatic recognition of Persian letters by Lip-reading using image processing methods
For many years, speech has been the most natural and efficient means of information exchange for human beings. With the advancement of technology and the prevalence of computer usage, the design and production of speech recognition systems have been considered by researchers. Among this, lip-reading techniques encountered with many challenges for speech recognition, that one of the challenges b...
متن کاملAuditory-based Acoustic Distinctive Features and Spectral Cues for Robust Automatic Speech Recognition in Low-SNR Car Environments
In this paper, a multi-stream paradigm is proposed to improve the performance of automatic speech recognition (ASR) systems in the presence of highly interfering car noise. It was found that combining the classical MFCCs with some auditory-based acoustic distinctive cues and the main formant frequencies of a speech signal using a multi-stream paradigm leads to an improvement in the recognition ...
متن کامل